empirical process
Non-asymptotic estimates of the minimal risk in statistical learning
In this paper we prove some concentration inequalities for two types of error probabilities in the Empirical Risk Principle (ERP) in statistical learning, which provide a lower bound and an upper bound for the minimal risk (in terms of the minimal empirical risk) with non-asymptotic high confidence. The usual boundedness condition of the empirical risk function is relaxed to the Gaussian or exponential integrability condition. The confidence of the lower bound of the minimal risk is shown to be independent of the number of training parameters and the dimension of the input vectors, allowing one to detect the deficiency of a learning machine efficiently; and the confidence of the upper bound of the minimal risk is proved to be high provided that the sample size $n$ is much greater than the box dimension of the parameter set $Θ$ in the Orlicz metric $d_{ψ_1}$ associated with the risk functions. Our work is based on Talagrand's concentration inequalities (the sharp versions by Bousquet and Klein-Rio), transport-entropy inequalities and the recent progress in the theory of empirical processes and statistical learning.
Appendix for " Beyond the Signs: Nonparametric Tensor Completion via Sign Series "
The appendix consists of proofs (Section A), additional theoretical results (Section B), and numerical experiments (Section C). When g is strictly increasing, the mapping x7 g(x) is sign preserving. Specifically, if x 0, then g(x) g(0) = 0. Conversely, ifg(x) 0 = g(0), then applying g 1 to both sides givesx 0. When g is strictly decreasing, the mappingx7 g(x) is sign reversing. See Section B.2 for constructive examples. Based on the definition of classification lossL(,), the function Risk() relies only on the sign pattern of the tensor.
Doubly Wild Refitting: Model-Free Evaluation of High Dimensional Black-Box Predictions under Convex Losses
Hu, Haichen, Simchi-Levi, David
We study the problem of excess risk evaluation for empirical risk minimization (ERM) under general convex loss functions. Our contribution is an efficient refitting procedure that computes the excess risk and provides high-probability upper bounds under the fixed-design setting. Assuming only black-box access to the training algorithm and a single dataset, we begin by generating two sets of artificially modified pseudo-outcomes--termed wild responses--created by stochastically perturbing the gradient vectors with carefully chosen scaling. Using these two pseudo-labeled datasets, we then refit the black-box procedure twice to obtain two corresponding wild predictors. Finally, leveraging the original predictor, the two wild predictors, and the constructed wild responses, we derive an efficient excess-risk upper bound. A key feature of our analysis is that it requires no prior knowledge of the complexity of the underlying function class. As a result, the method is essentially model-free and holds significant promise for theoretically evaluating modern opaque machine learning systems--such as deep neural networks and generative models--where traditional capacity-based learning theory becomes infeasible due to the extreme complexity of the hypothesis class.
Supplementary Material Additional Notation
A.1 Robust Mean Estimation from Subset Stability The upper bound is always less than null for m n . Let m be the largest value of f (x) for any x T with w ( x) null= 0 . Thus, by the weighted version of Lemma 2.4 of [DK19], we have that nullµ Section B.1, we show a result stating that pre-processing on i.i.d. points yields a set that contains Then, in Section B.2, we use a coupling argument to show a We recall the median of means principle. We now state our main result in this section, proved using minimax duality, that Theorem B.1 implies We first consider the case of i.i.d. In particular, Lemma E.2 shows that we can deterministically round We now prove Theorem 1.7, i.e., stability of a subset after corruption, using Theorem B.2.
Appendix for " Beyond the Signs: Nonparametric Tensor Completion via Sign Series "
See Section B.2 for constructive examples.Proof of Proposition 2. Based on (3) in Proposition 2, we have Risk( Z) Risk( Θ) = E null |sgnZ sgn Θ|| Θ|null . We divide the proof into two cases: α > 0 and α = . The inequality (6) now becomes Risk( Z) Risk( Θ) t null MAE(sgn Θ, sgnZ) C snull, for all 0 t < ρ(π, N) . Consider the same setup as in Theorem 2. Fix The conclusion (10) then directly follows by applying Remark A.1 to (11). 3 Proof of Theorem 2. To simplify the notation, we denote ρ = ρ(π, N). It follows from Kosorok (2007, Theorem 9.22) that the Proof of Theorem 3. By definition of ˆ Θ, we have MAE( ˆ Θ, Θ) = E null null null null null 1 2H + 1 null Assumption A.1, we establish the estimation accuracy guarantee for the large-margin estimators H log H. (29) In particualr, setting H null (1 + |N|) To apply Theorem A.1, we choose the pair ( L Here, we describe the details of the example set-up.